Usage

  • This R Markdown file imports data tables within Analysis_Results-Date folder and will knit without any modification.
  • ggsave code is kept for users to make customized plots.


Strain Issues

All strain names were converted to the corresponding isotype name, which can be looked up here: https://elegansvariation.org/strains/isotype_list. If you submitted replicate data, replicates for a given isotype were averaged to one mean value.

## [1] "No strain issues to report"

Manhattan plot

A genome-wide association study (GWAS) was performed by testing whether marker genotype differences can explain phenotypic variation. These tests correct for relatedness among individuals in the population using a genomic relatedness matrix (or “kinship matrix”). This anlaysis was performed with GCTA using two different kinship matrices: one constructed specifically with inbred model organisms in mind (INBRED) and one which is constructed from all markers except those on the chromosome of the tested marker (“leave-one-chromosome-out”; LOCO). The INBRED kinship matrix more heavily corrects for genetic stratification at the tested marker, while the LOCO kinship matrix does not, and may therefore increase power in certain contexts.

  • Every dot is a SNV marker.

  • SNVs are colored if they pass the genome-wide corrected significance threshold:

    • The horizontal solid line corresponds to stricter Bonferroni (BF) threshold which is based on the number of markers in the analysis.
    • The horizontal dash line corresponds to more permissive EIGEN threshold, which corrects for the number of independent markers in your data set. This threshold takes advantage of the extensive LD in C. elegans to limit the number of “unique” markers. (See Zdraljevic et al. 2019 (PMID: 30958264) for more)
    • If you selected a custom threshold, only this threshold is shown as a dotted line.

Genomic Inflation

The p-values calculated from each marker association test were compared to the theoretical distribution of p-values under the null hypothesis. This comparison is displayed for each chromosome in the quantile-quantile plots (Q-Q plots) below. The genomic inflation factor (λ_GC) estimates the inflation of observed p-values compared to a theoretical χ^2 [0.5,1]. Mappings producing genomic inflation factors greater than 1.25 may indicate some systematic bias, such as strong population stratification of phenotype values.

The genomic inflation factor is 1.5384522 for the INBRED mapping and 1.191966 for the LOCO mapping

The following sections of the report are shown for mappings performed using both the INBRED and LOCO kinship matrix construction approaches. It is recommended you choose one set of results based on the previous diagnostic plots. These results may vary between different traits.


INBRED

This is the default kinship matrix construction approach, designed for inbred model organisms (See https://yanglab.westlake.edu.cn/software/gcta/#MakingaGRM for more info).



Phenotype by Genotype Split

For each detected QTL, we can observe the phenotypes of the strains with the reference (REF) allele (i.e. same genotype as N2) compared to the phenotypes of the strains with the alternative (ALT) allele (i.e. genotype different than N2). A QTL is defined as a region where genetic variation is correlated with phenotypic variation, so we expect to see a difference in phenotype between the REF and ALT groups. In a best-case scenario, we like to see a large split between REF and ALT and a good number of strains in both groups. It is also important to ensure that the mean phenotype of neither group is driven by a small number of outlier strains.

A few select strains are highlighted due to their use in Andersen Lab dose response assays

Linkage disequilibrium

If your trait has multiple QTL, we calculate linkage disequilibrium (LD) between them. This is useful because sometimes we find that one strong QTL might be in linkage disequilibrium to a secondary QTL (even if it exists on another chromosome). If this is the case, the secondary QTL might not contain a true causal variant, thus it is important to check this before narrowing the QTL experimentally.



QTL region details

II:5132790-5764284

Fine mapping

Fine mapping was performed by evaluating the genotype-phenotype relationship for variants nearby the QTL identified from GWA mapping using a vcf containing imputed variants to avoid removing variants with missing genotype information for one or a few strains. Only SNVs were considered in this mapping.

Each variant is represented by a vertical line, colored by the predicted variant impact (i.e. HIGH impact variants could be variants that introduce a change in the amino acid sequence or a stop-gain). Genes are represented by horizontal lines with an arrow showing the direction of the gene.

## 
##  This second plot is very similar to the first. Here, each variant is represented by a diamond colored by the linkage to the peak marker (colored in red). This plot can be useful to determine what the strucutre of your region looks like. If you have many variants with high linkage to your peak marker, it is important to remember that any of those variants could be causal.



All variants in interval



Mediation analysis

Mediation analysis was performed to analyze if gene expression variation is significantly correlated with the phenotype (overlap of phenotype QTL with expression QTL). Top candidates whose expression might mediate the phenotype QTL are shown below. (Note: expression data currently unpublished). For more information about mediation analysis, check out Evans and Andersen 2020 (PMID: 32385045).



Divergent regions

We recently published about punctuated hyper-divergent regions in C. elegans (Lee et al. 2021 (PMID: 32385045)). Within these divergent regions, we are less confident about the variant calls and even the gene content between strains. For these reasons, if your QTL falls within a divergent region it may complicate your analyses and requires extra careful interpretation of fine-mapping results.

The following plot shows divergent regions for each strain across the QTL region. Strains are split by genotype at the peak marker. You should be careful if many strains are divergent, especially if most of the strains in the ALT group are divergent, for example.



Haplotype

The following plot shows the genome-wide haplotype (genetic relatedness) of mapped strains split by REF or ALT genotype. This plot can be useful to help identify how many unique haplotypes are present in the REF or ALT groups. If you want to choose parent strains for a NIL cross to validate this QTL, you might want to choose strains in the major haplotype of the REF/ALT groups that also have distinct phenotypes.


II:8180776-9357116

Fine mapping

Fine mapping was performed by evaluating the genotype-phenotype relationship for variants nearby the QTL identified from GWA mapping using a vcf containing imputed variants to avoid removing variants with missing genotype information for one or a few strains. Only SNVs were considered in this mapping.

Each variant is represented by a vertical line, colored by the predicted variant impact (i.e. HIGH impact variants could be variants that introduce a change in the amino acid sequence or a stop-gain). Genes are represented by horizontal lines with an arrow showing the direction of the gene.

## 
##  This second plot is very similar to the first. Here, each variant is represented by a diamond colored by the linkage to the peak marker (colored in red). This plot can be useful to determine what the strucutre of your region looks like. If you have many variants with high linkage to your peak marker, it is important to remember that any of those variants could be causal.



All variants in interval



Mediation analysis

Mediation analysis was performed to analyze if gene expression variation is significantly correlated with the phenotype (overlap of phenotype QTL with expression QTL). Top candidates whose expression might mediate the phenotype QTL are shown below. (Note: expression data currently unpublished). For more information about mediation analysis, check out Evans and Andersen 2020 (PMID: 32385045).



Divergent regions

We recently published about punctuated hyper-divergent regions in C. elegans (Lee et al. 2021 (PMID: 32385045)). Within these divergent regions, we are less confident about the variant calls and even the gene content between strains. For these reasons, if your QTL falls within a divergent region it may complicate your analyses and requires extra careful interpretation of fine-mapping results.

The following plot shows divergent regions for each strain across the QTL region. Strains are split by genotype at the peak marker. You should be careful if many strains are divergent, especially if most of the strains in the ALT group are divergent, for example.



Haplotype

The following plot shows the genome-wide haplotype (genetic relatedness) of mapped strains split by REF or ALT genotype. This plot can be useful to help identify how many unique haplotypes are present in the REF or ALT groups. If you want to choose parent strains for a NIL cross to validate this QTL, you might want to choose strains in the major haplotype of the REF/ALT groups that also have distinct phenotypes.


II:10733554-11733188

Fine mapping

Fine mapping was performed by evaluating the genotype-phenotype relationship for variants nearby the QTL identified from GWA mapping using a vcf containing imputed variants to avoid removing variants with missing genotype information for one or a few strains. Only SNVs were considered in this mapping.

Each variant is represented by a vertical line, colored by the predicted variant impact (i.e. HIGH impact variants could be variants that introduce a change in the amino acid sequence or a stop-gain). Genes are represented by horizontal lines with an arrow showing the direction of the gene.

## 
##  This second plot is very similar to the first. Here, each variant is represented by a diamond colored by the linkage to the peak marker (colored in red). This plot can be useful to determine what the strucutre of your region looks like. If you have many variants with high linkage to your peak marker, it is important to remember that any of those variants could be causal.



All variants in interval



Mediation analysis

Mediation analysis was performed to analyze if gene expression variation is significantly correlated with the phenotype (overlap of phenotype QTL with expression QTL). Top candidates whose expression might mediate the phenotype QTL are shown below. (Note: expression data currently unpublished). For more information about mediation analysis, check out Evans and Andersen 2020 (PMID: 32385045).



Divergent regions

We recently published about punctuated hyper-divergent regions in C. elegans (Lee et al. 2021 (PMID: 32385045)). Within these divergent regions, we are less confident about the variant calls and even the gene content between strains. For these reasons, if your QTL falls within a divergent region it may complicate your analyses and requires extra careful interpretation of fine-mapping results.

The following plot shows divergent regions for each strain across the QTL region. Strains are split by genotype at the peak marker. You should be careful if many strains are divergent, especially if most of the strains in the ALT group are divergent, for example.



Haplotype

The following plot shows the genome-wide haplotype (genetic relatedness) of mapped strains split by REF or ALT genotype. This plot can be useful to help identify how many unique haplotypes are present in the REF or ALT groups. If you want to choose parent strains for a NIL cross to validate this QTL, you might want to choose strains in the major haplotype of the REF/ALT groups that also have distinct phenotypes.


III:1997851-2735727

Fine mapping

Fine mapping was performed by evaluating the genotype-phenotype relationship for variants nearby the QTL identified from GWA mapping using a vcf containing imputed variants to avoid removing variants with missing genotype information for one or a few strains. Only SNVs were considered in this mapping.

Each variant is represented by a vertical line, colored by the predicted variant impact (i.e. HIGH impact variants could be variants that introduce a change in the amino acid sequence or a stop-gain). Genes are represented by horizontal lines with an arrow showing the direction of the gene.

## 
##  This second plot is very similar to the first. Here, each variant is represented by a diamond colored by the linkage to the peak marker (colored in red). This plot can be useful to determine what the strucutre of your region looks like. If you have many variants with high linkage to your peak marker, it is important to remember that any of those variants could be causal.



All variants in interval



Mediation analysis

Mediation analysis was performed to analyze if gene expression variation is significantly correlated with the phenotype (overlap of phenotype QTL with expression QTL). Top candidates whose expression might mediate the phenotype QTL are shown below. (Note: expression data currently unpublished). For more information about mediation analysis, check out Evans and Andersen 2020 (PMID: 32385045).



Divergent regions

We recently published about punctuated hyper-divergent regions in C. elegans (Lee et al. 2021 (PMID: 32385045)). Within these divergent regions, we are less confident about the variant calls and even the gene content between strains. For these reasons, if your QTL falls within a divergent region it may complicate your analyses and requires extra careful interpretation of fine-mapping results.

The following plot shows divergent regions for each strain across the QTL region. Strains are split by genotype at the peak marker. You should be careful if many strains are divergent, especially if most of the strains in the ALT group are divergent, for example.



Haplotype

The following plot shows the genome-wide haplotype (genetic relatedness) of mapped strains split by REF or ALT genotype. This plot can be useful to help identify how many unique haplotypes are present in the REF or ALT groups. If you want to choose parent strains for a NIL cross to validate this QTL, you might want to choose strains in the major haplotype of the REF/ALT groups that also have distinct phenotypes.







LOCO

LOCO may provide increased power to detect QTL because it does not correct for relatedness (or stratification) on the chromosome of each tested marker, sometimes providing higher power to detect linked QTL or QTL within divergent regions. However, this higher power also comes with higher false discovery rates. For more info, check out Widmayer et al. 2021 (https://www.biorxiv.org/content/10.1101/2021.09.09.459688v1).

## [1] "No QTL were identified with the LOCO algorithm."

Please kindly cite the following publications

  • Widmayer SJ, Evans KS, Zdraljevic S, and Andersen EC (2021) Evaluating the power and limitations of genome-wide association mapping in C. elegans. bioRxiv 2021.09.09.459688.
  • Lee D, Zdraljevic S, Stevens L, Wang Y, Tanny RE, Crombie TA, Cook DE, Webster AK, Chirakar R, Baugh LR, Sterken M, Braendle C, Felix M-A, Rockman MV, and Andersen EC (2020) Balancing selection maintains ancient genetic diversity in C. elegans. Nature Ecology and Evolution, 2021 Apr 5; DOI: 10.1038/s41559-021-01435-x.
  • Zdraljevic S, Fox BW, Strand C, Panda O, Tenjo-Castano FJ, Brady SC, Crombie TA, Doench JG, Schroeder FC, and Andersen EC (2019) Natural variation in arsenic toxicity is explained by differences in branched chain amino acid catabolism eLife, Apr 8;8: e40260.
  • Cook DE, Zdraljevic S, Roberts JP, Andersen EC (2016) CeNDR, the Caenorhabditis elegans Natural Diversity Resource. Nucleic Acids Research, Jan 4; 45(D1):D650-D657.



## R version 3.6.0 (2019-04-26)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 10 (buster)
## 
## Matrix products: default
## BLAS/LAPACK: /opt/conda/lib/R/lib/libRblas.so
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] cowplot_0.9.1      ggnewscale_0.4.5   genetics_1.3.8.1.2
##  [4] mvtnorm_1.0-10     MASS_7.3-51.3      gtools_3.8.1      
##  [7] gdata_2.18.0       combinat_0.0-8     ggrepel_0.9.1     
## [10] knitr_1.22         ggbeeswarm_0.6.0   DT_0.5            
## [13] plotly_4.9.0       purrr_0.3.4        glue_1.3.1        
## [16] readr_1.3.1        stringr_1.4.0      ggplot2_3.3.3     
## [19] tidyr_1.1.3        dplyr_0.8.5       
## 
## loaded via a namespace (and not attached):
##  [1] beeswarm_0.2.3     tidyselect_1.1.0   xfun_0.6          
##  [4] colorspace_1.4-1   vctrs_0.3.8        htmltools_0.3.6   
##  [7] viridisLite_0.3.0  yaml_2.2.0         utf8_1.1.4        
## [10] rlang_0.4.11       later_0.8.0        pillar_1.6.1      
## [13] withr_2.1.2        RColorBrewer_1.1-2 lifecycle_1.0.0   
## [16] munsell_0.5.0      gtable_0.3.0       htmlwidgets_1.3   
## [19] evaluate_0.13      labeling_0.3       httpuv_1.5.1      
## [22] crosstalk_1.0.0    vipor_0.4.5        fansi_0.4.0       
## [25] Rcpp_1.0.1         xtable_1.8-4       promises_1.0.1    
## [28] scales_1.0.0       jsonlite_1.6       mime_0.6          
## [31] hms_1.1.0          digest_0.6.18      stringi_1.4.3     
## [34] shiny_1.3.2        grid_3.6.0         tools_3.6.0       
## [37] magrittr_1.5       lazyeval_0.2.2     tibble_3.1.2      
## [40] crayon_1.3.4       pkgconfig_2.0.2    ellipsis_0.3.2    
## [43] data.table_1.12.2  assertthat_0.2.1   rmarkdown_1.12    
## [46] httr_1.4.2         R6_2.4.0           compiler_3.6.0